AITopics | negotiable reinforcement learning

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Neural Information Processing SystemsMar-16-2026, 20:56:58 GMT

It is commonly believed that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a Pareto optimal policy, i.e. a policy that cannot be improved upon for one principal without making sacrifices for another. Harsanyi's theorem shows that when the principals have a common prior on the outcome distributions of all policies, a Pareto optimal policy for the agent is one that maximizes a fixed, weighted linear combination of the principals' utilities. In this paper, we derive a more precise generalization for the sequential decision setting in the case of principals with different priors on the dynamics of the environment. We refer to this generalization as the Negotiable Reinforcement Learning (NRL) framework. In this more general case, the relative weight given to each principal's utility should evolve over time according to how well the agent's observations conform with that principal's prior. To gain insight into the dynamics of this new framework, we implement a simple NRL agent and empirically examine its behavior in a simple environment.

artificial intelligence, machine learning, reinforcement learning, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.44)

Add feedback

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Nishant Desai, Andrew Critch, Stuart J. Russell

Neural Information Processing SystemsFeb-12-2026, 21:58:31 GMT

Neural Information Processing Systems http://nips.cc/

agent, pomdp, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.05)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Neural Information Processing SystemsNov-20-2025, 22:17:10 GMT

It is commonly believed that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a Pareto optimal policy, i.e. a policy that cannot be improved upon for one principal without making sacrifices for another. Harsanyi's theorem shows that when the principals have a common prior on the outcome distributions of all policies, a Pareto optimal policy for the agent is one that maximizes a fixed, weighted linear combination of the principals' utilities. In this paper, we derive a more precise generalization for the sequential decision setting in the case of principals with different priors on the dynamics of the environment. We refer to this generalization as the Negotiable Reinforcement Learning (NRL) framework. In this more general case, the relative weight given to each principal's utility should evolve over time according to how well the agent's observations conform with that principal's prior. To gain insight into the dynamics of this new framework, we implement a simple NRL agent and empirically examine its behavior in a simple environment.

name change, negotiable reinforcement learning, pareto optimal sequential decision-making, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Nishant Desai, Andrew Critch, Stuart J. Russell

Neural Information Processing SystemsNov-20-2025, 16:42:34 GMT

Harsanyi's theorem shows that when the principals have a

agent, pomdp, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.05)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Reviews: Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Neural Information Processing SystemsOct-8-2024, 21:06:08 GMT

Summary: This paper reasons about a Pareto optimal social choice function in which the principles seek to agree on how to agree to use a system that acts in a sequential decision-making problem in which the principles may not share the same prior beliefs. Results suggest that to obtain such a function, the mechanism must over time make choices that favor the principle who has beliefs that appear to be more correct. Quality: The work appears to be correct as far as I have been able to discern. However, I do not like the idea of not having the proof of the main theorem (Theorem 4) in the main paper, even if for the sake of brevity. My opinion is that If the theorem is that important, its proof should be next to it.

negotiable reinforcement learning, pareto optimal sequential decision-making, sequential decision problem, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Neural Information Processing SystemsOct-8-2024, 16:29:32 GMT

It is commonly believed that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a Pareto optimal policy, i.e. a policy that cannot be improved upon for one principal without making sacrifices for another. Harsanyi's theorem shows that when the principals have a common prior on the outcome distributions of all policies, a Pareto optimal policy for the agent is one that maximizes a fixed, weighted linear combination of the principals' utilities. In this paper, we derive a more precise generalization for the sequential decision setting in the case of principals with different priors on the dynamics of the environment. We refer to this generalization as the Negotiable Reinforcement Learning (NRL) framework. In this more general case, the relative weight given to each principal's utility should evolve over time according to how well the agent's observations conform with that principal's prior.

negotiable reinforcement learning, pareto optimal policy, pareto optimal sequential decision-making, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Desai, Nishant, Critch, Andrew, Russell, Stuart J.

Neural Information Processing SystemsFeb-14-2020, 15:26:39 GMT

It is commonly believed that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a Pareto optimal policy, i.e. a policy that cannot be improved upon for one principal without making sacrifices for another. Harsanyi's theorem shows that when the principals have a common prior on the outcome distributions of all policies, a Pareto optimal policy for the agent is one that maximizes a fixed, weighted linear combination of the principals' utilities. In this paper, we derive a more precise generalization for the sequential decision setting in the case of principals with different priors on the dynamics of the environment. We refer to this generalization as the Negotiable Reinforcement Learning (NRL) framework. In this more general case, the relative weight given to each principal's utility should evolve over time according to how well the agent's observations conform with that principal's prior.

negotiable reinforcement learning, pareto optimal policy, pareto optimal sequential decision-making, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Desai, Nishant, Critch, Andrew, Russell, Stuart J.

Neural Information Processing SystemsDec-31-2018

It is commonly believed that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a Pareto optimal policy, i.e. a policy that cannot be improved upon for one principal without making sacrifices for another. Harsanyi's theorem shows that when the principals have a common prior on the outcome distributions of all policies, a Pareto optimal policy for the agent is one that maximizes a fixed, weighted linear combination of the principals' utilities. In this paper, we derive a more precise generalization for the sequential decision setting in the case of principals with different priors on the dynamics of the environment. We refer to this generalization as the Negotiable Reinforcement Learning (NRL) framework. In this more general case, the relative weight given to each principal's utility should evolve over time according to how well the agent's observations conform with that principal's prior. To gain insight into the dynamics of this new framework, we implement a simple NRL agent and empirically examine its behavior in a simple environment.

Add feedback

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Desai, Nishant, Critch, Andrew, Russell, Stuart J.

Neural Information Processing SystemsDec-31-2018

It is commonly believed that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a Pareto optimal policy, i.e. a policy that cannot be improved upon for one principal without making sacrifices for another. Harsanyi's theorem shows that when the principals have a common prior on the outcome distributions of all policies, a Pareto optimal policy for the agent is one that maximizes a fixed, weighted linear combination of the principals' utilities. In this paper, we derive a more precise generalization for the sequential decision setting in the case of principals with different priors on the dynamics of the environment. We refer to this generalization as the Negotiable Reinforcement Learning (NRL) framework. In this more general case, the relative weight given to each principal's utility should evolve over time according to how well the agent's observations conform with that principal's prior. To gain insight into the dynamics of this new framework, we implement a simple NRL agent and empirically examine its behavior in a simple environment.

Add feedback

Collaborating Authors

negotiable reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Reviews: Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making